40 research outputs found
Document Filtering for Long-tail Entities
Filtering relevant documents with respect to entities is an essential task in
the context of knowledge base construction and maintenance. It entails
processing a time-ordered stream of documents that might be relevant to an
entity in order to select only those that contain vital information.
State-of-the-art approaches to document filtering for popular entities are
entity-dependent: they rely on and are also trained on the specifics of
differentiating features for each specific entity. Moreover, these approaches
tend to use so-called extrinsic information such as Wikipedia page views and
related entities which is typically only available only for popular head
entities. Entity-dependent approaches based on such signals are therefore
ill-suited as filtering methods for long-tail entities. In this paper we
propose a document filtering method for long-tail entities that is
entity-independent and thus also generalizes to unseen or rarely seen entities.
It is based on intrinsic features, i.e., features that are derived from the
documents in which the entities are mentioned. We propose a set of features
that capture informativeness, entity-saliency, and timeliness. In particular,
we introduce features based on entity aspect similarities, relation patterns,
and temporal expressions and combine these with standard features for document
filtering. Experiments following the TREC KBA 2014 setup on a publicly
available dataset show that our model is able to improve the filtering
performance for long-tail entities over several baselines. Results of applying
the model to unseen entities are promising, indicating that the model is able
to learn the general characteristics of a vital document. The overall
performance across all entities---i.e., not just long-tail entities---improves
upon the state-of-the-art without depending on any entity-specific training
data.Comment: CIKM2016, Proceedings of the 25th ACM International Conference on
Information and Knowledge Management. 201
Towards a framework for critical citizenship education
Increasingly countries around the world are promoting forms of "critical" citizenship in the planned curricula of schools. However, the intended meaning behind this term varies markedly and can range from a set of creative and technical skills under the label "critical thinking" to a desire to encourage engagement, action and political emancipation, often labelled "critical pedagogy". This paper distinguishes these manifestations of the "critical" and, based on an analysis of the prevailing models of critical pedagogy and citizenship education, develops a conceptual framework for analysing and comparing the nature of critical citizenship
Non-invasive beam profile monitor for medical accelerators
A beam profile monitor based on a supersonic gas-curtain is currently under development for transverse profile
diagnostics of electron and proton beams in the High Luminosity LHC. This monitor uses a thin supersonic gas
curtain that crosses the primary beam to be characterized under an angle of 45 degrees. The fluorescence caused
by the interaction between the beam and gas-curtain is detected using a specially designed imaging system to
determine the 2D transverse profile of the primary beam. Another prototype monitor based on beam induced
ionization is installed at The Cockcroft Institute. This paper presents the design features of both the monitors, the
gas-jet curtain formation and various experimental tests, including profile measurements of an electron beam,
using helium, nitrogen and neon as gases. Such a non-invasive online beam profile monitor would be highly
desirable also for medical LINAC’s and storage rings as it can characterize the beam without stopping machine
operation. The paper discusses opportunities for simplifying the monitor design for integration into a medical
accelerator and expected monitor performance
Mining clinical relationships from patient narratives
Background
The Clinical E-Science Framework (CLEF) project has built a system to extract clinically significant information from the textual component of medical records in order to support clinical research, evidence-based healthcare and genotype-meets-phenotype informatics. One part of this system is the identification of relationships between clinically important entities in the text. Typical approaches to relationship extraction in this domain have used full parses, domain-specific grammars, and large knowledge bases encoding domain knowledge. In other areas of biomedical NLP, statistical machine learning (ML) approaches are now routinely applied to relationship extraction. We report on the novel application of these statistical techniques to the extraction of clinical relationships.
Results
We have designed and implemented an ML-based system for relation extraction, using support vector machines, and trained and tested it on a corpus of oncology narratives hand-annotated with clinically important relationships. Over a class of seven relation types, the system achieves an average F1 score of 72%, only slightly behind an indicative measure of human inter annotator agreement on the same task. We investigate the effectiveness of different features for this task, how extraction performance varies between inter- and intra-sentential relationships, and examine the amount of training data needed to learn various relationships.
Conclusion
We have shown that it is possible to extract important clinical relationships from text, using supervised statistical ML techniques, at levels of accuracy approaching those of human annotators. Given the importance of relation extraction as an enabling technology for text mining and given also the ready adaptability of systems based on our supervised learning approach to other clinical relationship extraction tasks, this result has significance for clinical text mining more generally, though further work to confirm our encouraging results should be carried out on a larger sample of narratives and relationship types
Reflections on the 'History and Historians' of the black woman's role in the community of slaves: enslaved women and intimate partner sexual violence
Taking as points of inspiration Peter Parish’s 1989 book, Slavery: History and Historians, and Angela Davis’s seminal 1971 article, “Reflections on the black woman’s role in the community of slaves,” this probes both historiographically and methodologically some of the challenges faced by historians writing about the lives of enslaved women through a case study of intimate partner violence among enslaved people in the antebellum South. Because rape and sexual assault have been defined in the past as non-consensual sexual acts supported by surviving legal evidence (generally testimony from court trials), it is hard for historians to research rape and sexual violence under slavery (especially marital rape) as there was no legal standing for the rape of enslaved women or the rape of any woman within marriage. This article suggests enslaved women recognized that black men could both be perpetrators of sexual violence and simultaneously be victims of the system of slavery. It also argues women stoically tolerated being forced into intimate relationships, sometimes even staying with “husbands” imposed upon them after emancipation
Finishing the euchromatic sequence of the human genome
The sequence of the human genome encodes the genetic instructions for human physiology, as well as rich information about human evolution. In 2001, the International Human Genome Sequencing Consortium reported a draft sequence of the euchromatic portion of the human genome. Since then, the international collaboration has worked to convert this draft into a genome sequence with high accuracy and nearly complete coverage. Here, we report the result of this finishing process. The current genome sequence (Build 35) contains 2.85 billion nucleotides interrupted by only 341 gaps. It covers ∼99% of the euchromatic genome and is accurate to an error rate of ∼1 event per 100,000 bases. Many of the remaining euchromatic gaps are associated with segmental duplications and will require focused work with new methods. The near-complete sequence, the first for a vertebrate, greatly improves the precision of biological analyses of the human genome including studies of gene number, birth and death. Notably, the human enome seems to encode only 20,000-25,000 protein-coding genes. The genome sequence reported here should serve as a firm foundation for biomedical research in the decades ahead
Extracting Emerging Knowledge from Social Media
Massive data integration technologies have been recently used to produce very large ontologies. However, knowledge in the world continuously evolves, and ontologies are largely incomplete for what concerns low-frequency data, belonging to the so-called long tail. Socially produced content is an excellent source for discovering emerging knowledge: it is huge, and immediately reflects the relevant changes which hide emerging entities. Thus, we propose a method for discovering emerging entities by extracting them from social content.
Once instrumented by experts through very simple initialization, the method is capable of finding emerging entities; we use a purely syntactic method as a baseline, and we propose several semantics-based variants. The method uses seeds, i.e. prototypes of emerging entities provided by experts, for generating candidates; then, it associates candidates to feature vectors, built by using terms occurring in their social content, and then ranks the candidates by using their distance from the centroid of seeds, returning the top candidates as result. The method can be continuously or periodically iterated, using the results as new seeds. We validate our method by applying it to a set of diverse domain-specific application scenarios, spanning fashion, literature, and exhibitions